Learning From Imbalanced Data: Rank Metrics and Extra Tasks
نویسنده
چکیده
Imbalanced data creates two problems for machine learning. First, even if the training set is large, the sample size of smaller classes may be small. Learning accurate models from small samples is hard. Multitask learning is one way to learn more accurate models from small samples that is particularly well suited to imbalanced data. A second problem when learning from imbalanced ata is that the usual error metrics (e.g., accuracy or squared error) cause learning to pay more attention to large classes than to small classes. This problem can be mitigated by careful selection of the error metric. We find rank based error metrics often perform better when an important class is under-represented.
منابع مشابه
Cost-Sensitive Convolution based Neural Networks for Imbalanced Time-Series Classification
Some deep convolutional neural networks were proposed for time-series classification and class imbalanced problems. However, those models performed degraded and even failed to recognize the minority class of an imbalanced temporal sequences dataset. Minority samples would bring troubles for temporal deep learning classifiers due to the equal treatments of majority and minority class. Until rece...
متن کاملInvestigation of Term Weighting Schemes in Classification of Imbalanced Texts
Class imbalance problem in data, plays a critical role in use of machine learning methods for text classification since feature selection methods expect homogeneous distribution as well as machine learning methods. This study investigates two different kinds of feature selection metrics (one-sided and two-sided) as a global component of term weighting schemes (called as tffs) in scenarios where...
متن کاملCost-Sensitive Support Vector Ranking for Information Retrieval
In recent years, the algorithms of learning to rank have been proposed by researchers. However, in information retrieval, instances of ranks are imbalanced. After the instances of ranks are composed to pairs, the pairs of ranks are imbalanced too. In this paper, a cost-sensitive risk minimum model of pairwise learning to rank imbalanced data sets is proposed. Following this model, the algorithm...
متن کاملEnhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...
متن کاملDeep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning
Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003